17 research outputs found

    GANBA: Generative Adversarial Network for Biometric Anti-Spoofing

    Get PDF
    Acknowledgments: Alejandro Gomez-Alanis holds a FPU fellowship (FPU16/05490) from the Spanish Ministry of Education and Vocational Training. Jose A. Gonzalez-Lopez also holds a Juan de la Cierva-Incorporación fellowship (IJCI-2017-32926) from the Spanish Ministry of Science and Innovation. Furthermore, we acknowledge the support of Nvidia with the donation of a Titan X GPU.Data Availability Statement: The ASVspoof 2019 datasets were used in this study. They are publicly available at https://datashare.ed.ac.uk/handle/10283/3336 (accessed on 5 December 2021).Automatic speaker verification (ASV) is a voice biometric technology whose security might be compromised by spoofing attacks. To increase the robustness against spoofing attacks, presentation attack detection (PAD) or anti-spoofing systems for detecting replay, text-to-speech and voice conversion-based spoofing attacks are being developed. However, it was recently shown that adversarial spoofing attacks may seriously fool anti-spoofing systems. Moreover, the robustness of the whole biometric system (ASV + PAD) against this new type of attack is completely unexplored. In this work, a new generative adversarial network for biometric anti-spoofing (GANBA) is proposed. GANBA has a twofold basis: (1) it jointly employs the anti-spoofing and ASV losses to yield very damaging adversarial spoofing attacks, and (2) it trains the PAD as a discriminator in order to make them more robust against these types of adversarial attacks. The proposed system is able to generate adversarial spoofing attacks which can fool the complete voice biometric system. Then, the resulting PAD discriminators of the proposed GANBA can be used as a defense technique for detecting both original and adversarial spoofing attacks. The physical access (PA) and logical access (LA) scenarios of the ASVspoof 2019 database were employed to carry out the experiments. The experimental results show that the GANBA attacks are quite effective, outperforming other adversarial techniques when applied in white-box and black-box attack setups. In addition, the resulting PAD discriminators are more robust against both original and adversarial spoofing attacks.FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades Proyecto PY20_00902PID2019-104206GB-I00 funded by MCIN/ AEI /10.13039/50110001103

    Dual-Channel Speech Enhancement Based on Extended Kalman Filter Relative Transfer Function Estimation

    Get PDF
    This paper deals with speech enhancement in dual-microphone smartphones using beamforming along with postfiltering techniques. The performance of these algorithms relies on a good estimation of the acoustic channel and speech and noise statistics. In this work we present a speech enhancement system that combines the estimation of the relative transfer function (RTF) between microphones using an extended Kalman filter framework with a novel speech presence probability estimator intended to track the noise statistics’ variability. The available dual-channel information is exploited to obtain more reliable estimates of clean speech statistics. Noise reduction is further improved by means of postfiltering techniques that take advantage of the speech presence estimation. Our proposal is evaluated in different reverberant and noisy environments when the smartphone is used in both close-talk and far-talk positions. The experimental results show that our system achieves improvements in terms of noise reduction, low speech distortion and better speech intelligibility compared to other state-of-the-art approaches.Spanish MINECO/FEDER Project TEC2016-80141-PSpanish Ministry of Education through the National Program FPU under Grant FPU15/0416

    A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality

    Get PDF
    This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a loss function, for training deep learning methods. This metric, derived from the perceptual evaluation of the speech quality algorithm, is computed in a per-frame basis and from the power spectra of the reference and processed speech signal. Thus, two disturbance terms, which account for distortion once auditory masking and threshold effects are factored in, amend the mean square error (MSE) loss function by introducing perceptual criteria based on human psychoacoustics. The proposed loss function is evaluated for noisy speech enhancement with deep neural networks. Experimental results show that our metric achieves significant gains in speech quality (evaluated using an objective metric and a listening test) when compared to using MSE or other perceptual-based loss functions from the literature.Spanish MINECO/FEDER (Grant Number: TEC2016-80141-P)Spanish Ministry of Education through the National Program FPU (Grant Number: FPU15/04161)NVIDIA Corporation with the donation of a Titan X GP

    Selección y estimación de parámetros en sistemas de reconocimiento de voz basados en modelos ocultos de Markov

    No full text
    El presente trabajo esta comprometido esencialmente con el estudio de algunos aspectos del modelado acústico con modelos ocultos de markov (hmm), que, por ahora, sigue siendo la aproximación mas extendida en el ámbito del reconocimiento automático del habla. Fundamentalmente, se aborda el problema de la selección y estimación de los modelos, así como su enriquecimiento con la incorporación de informaciones útiles que no son, en otros casos, aprovechadas. entre los puntos básicos puede destacarse: 1) enriquecimiento de los vectores de características mediante la incorporación de información sobre la energía de la señal y proposición de un distancia de procesado de señal apropiada, 2) estudio sobre las posibilidades de estimación discriminativa en modelos con cuantización múltiple (mvq) y obtención de nuevos algoritmos de diseño discriminativo de diccionarios vq, y 3) creación de un nuevo modelado hmm (denominado scmvq) que incorpora las ideas de mvq y modelado hmm semicontinuo.Tesis Univ. Granada. Departamento de Electrónica y Tecnología de Computadore

    Selección y estimación de parámetros en sistemas de reconocimiento de voz basados en modelos ocultos de Markov

    No full text
    El presente trabajo esta comprometido esencialmente con el estudio de algunos aspectos del modelado acústico con modelos ocultos de markov (hmm), que, por ahora, sigue siendo la aproximación mas extendida en el ámbito del reconocimiento automático del habla. Fundamentalmente, se aborda el problema de la selección y estimación de los modelos, así como su enriquecimiento con la incorporación de informaciones útiles que no son, en otros casos, aprovechadas. entre los puntos básicos puede destacarse: 1) enriquecimiento de los vectores de características mediante la incorporación de información sobre la energía de la señal y proposición de un distancia de procesado de señal apropiada, 2) estudio sobre las posibilidades de estimación discriminativa en modelos con cuantización múltiple (mvq) y obtención de nuevos algoritmos de diseño discriminativo de diccionarios vq, y 3) creación de un nuevo modelado hmm (denominado scmvq) que incorpora las ideas de mvq y modelado hmm semicontinuo.Tesis Univ. Granada. Departamento de Electrónica y Tecnología de Computadore

    Adversarial Transformation of Spoofing Attacks for Voice Biometrics

    Get PDF
    Voice biometric systems based on automatic speaker verifi- cation (ASV) are exposed to spoofing attacks which may com- promise their security. To increase the robustness against such attacks, anti-spoofing or presentation attack detection (PAD) systems have been proposed for the detection of replay, synthe- sis and voice conversion based attacks. Recently, the scientific community has shown that PAD systems are also vulnerable to adversarial attacks. However, to the best of our knowledge, no previous work have studied the robustness of full voice biomet- rics systems (ASV + PAD) to these new types of adversarial spoofing attacks. In this work, we develop a new adversarial biometrics transformation network (ABTN) which jointly pro- cesses the loss of the PAD and ASV systems in order to generate white-box and black-box adversarial spoofing attacks. The core idea of this system is to generate adversarial spoofing attacks which are able to fool the PAD system without being detected by the ASV system. The experiments were carried out on the ASVspoof 2019 corpus, including both logical access (LA) and physical access (PA) scenarios. The experimental results show that the proposed ABTN clearly outperforms some well-known adversarial techniques in both white-box and black-box attack scenarios.Proyecto PID2019-104206GB-I00/SRA/10.13039/50110001103

    Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation

    Get PDF
    This article presents a recursive expectation-maximization algorithm for online multichannel speech enhancement. A deep neural network mask estimator is used to compute the speech presence probability, which is then improved by means of statistical spatial models of the noisy speech and noise signals. The clean speech signal is estimated using beamforming, single-channel linear postfiltering and speech presence masking. The clean speech statistics and speech presence probabilities are finally used to compute the acoustic parameters for beamforming and postfiltering by means of maximum likelihood estimation. This iterative procedure is carried out on a frame-by-frame basis. The algorithm integrates the different estimates in a common statistical framework suitable for online scenarios. Moreover, our method can successfully exploit spectral, spatial and temporal speech properties. Our proposed algorithm is tested in different noisy environments using the multichannel recordings of the CHiME-4 database. The experimental results show that our method outperforms other related state-of-the-art approaches in noise reduction performance, while allowing low-latency processing for real-time applications.Spanish MICINN/FEDER (Grant Number: PID2019-104206GB-I00)Spanish Ministry of Universities National Program FPU (Grant Number: FPU15/04161

    Dual-channel eKF-RTF framework for speech enhancement with DNN-based speech presence estimation

    Get PDF
    This paper presents a dual-channel speech enhance- ment framework that effectively integrates deep neural net- work (DNN) mask estimators. Our framework follows a beamforming-plus-postfiltering approach intended for noise reduction on dual-microphone smartphones. An extended Kalman filter is used for the estimation of the relative acous- tic channel between microphones, while the noise estimation is performed using a speech presence probability estimator. We propose the use of a DNN estimator to improve the prediction of the speech presence probabilities without making any assump- tion about the statistics of the signals. We evaluate and compare different dual-channel features to improve the accuracy of this estimator, including the power and phase difference between the speech signals at the two microphones. The proposed in- tegrated scheme is evaluated in different reverberant and noisy environments when the smartphone is used in both close- and far-talk positions. The experimental results show that our ap- proach achieves significant improvements in terms of speech quality, intelligibility, and distortion when compared to other approaches based only on statistical signal processing.Spanish Ministry of Science and Innovation Project No. PID2019-104206GB- I00/AEI/10.13039/501100011033Spanish Ministry of Uni- versities through the National Program FPU (grant reference FPU15/04161

    On the Application of Conformers to Logical Access Voice Spoofing Attack Detection

    Get PDF
    Biometric systems are exposed to spoofing attacks which may compromise their security, and automatic speaker verification (ASV) is no exception. To increase the robustness against such attacks, anti-spoofing systems have been proposed for the de- tection of spoofed audio attacks. However, most of these sys- tems can not capture long-term feature dependencies and can only extract local features. While transformers are an excellent solution for the exploitation of these long-distance correlations, they may degrade local details. On the contrary, convolutional neural networks (CNNs) are a powerful tool for extracting lo- cal features but not so much for capturing global representa- tions. The conformer is a model that combines the best of both techniques, CNNs and transformers, to model both local and global dependencies and has been used for speech recogni- tion achieving state-of-the-art performance. While conformers have been mainly applied to sequence-to-sequence problems, in this work we make a preliminary study of their adaptation to a binary classification task such as anti-spoofing, with focus on synthesis and voice-conversion-based attacks. To evaluate our proposals, experiments were carried out on the ASVspoof 2019 logical access database. The experimental results show that the proposed system can obtain encouraging results, although more research will be required in order to outperform other state-of- the-art systems.Project PID2019-104206GB-I00 funded by MCIN/AEI/10.13039/501100011033FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades, Proyecto PY20_0090

    Database dependence comparison in detection of physical access voice spoofing attacks

    Get PDF
    The antispoofing challenges are designed to work on a sin- gle database, on which we can test our model. The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative that aims to promote the consideration of spoofing and the development of countermeasures. In general, the idea of analyzing the databases individually has been the domain approach but this could be rather misleading. This paper provides a study of the general- ization capability of antispoofing systems based on neural net- works by combining different databases for training and testing. We will try to give a broader vision of the advantages of group- ing different datasets. We will delve into the ”replay attacks” on physical data. This type of attack is one of the most difficult to detect since only a few minutes of audio samples are needed to impersonate the voice of a genuine speaker and gain access to the ASV system. To carry out this task, the ASV databases from ASVspoof-challenge have been chosen and will be used to have a more concrete and accurate vision of them. We report results on these databases using different neural network architectures and set-ups.Project PID2019-104206GB-I00 funded by MCIN/AEI/10.13039/501100011033FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades. Proyecto PY20_0090
    corecore